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SITE FTNDIIN G 
FIELD OF im EWENTTON 

The present inventiou rslates to searching for infoimation or. a data network, and 
e^pjecially to searching utilizing an analysis of the results of search engines. 
6 BACKGROUND OF THE INVENTION 

' i It is laio\^ii in the art to ms,\yzc data iiet>vorks, such as journals and jounxal citadoos, to 
detennine meta kna\^4cdgc about the field. 

IBM Inc., described a merhod of determining hubs and aufcorities on the Internet, m 
pacenf 5,gS4,305, in a US patent application number 08;8 13,749 filed March 7, 1997, 
10 mentioned in the patent and in "Autharitative Sources in a Hyperlinkcd Environment"^ by Jon 
M. K,ieinberg, in IBM research repoit RJ1007C{91S92), topic area "Computer Science", Mav 
29, 1997, the disclosures of wliich are incoiporated hercin by reference. Hubs are Internet sites 
that contain links to many odier sites in a same field and authorities are sites that are pointed to 
by 'd sii^iificant number of relevant sites in a field. An iterative process was suggested to 
16 determine, from among a predetemiined set of sites, a kernel of 3ites that match a hub or 
authority defmition. In the Kleinbcrg paper, it is noted that the Internet is to be considered a 
different type of data network than jounial tutides. 

A paper entitled "Mining the Web*s Link Structure", by S. Chakrabani ct al, in IEEE 
Computer, A ugust 1999, the disclosure of v/hich is incorporated herein by reference, describes 
20 analyzing link structures of WWW pages to determine hubs and authorities. At a site 
"littp:/Aw\v.google.com'\ available or February 1, 2000 and for some time before, a tool 
'^googlescout" is suggested for detecting; WWW sites that are similar to a shown site, for 
examplt: for finding competition. 

A WWW page "wwwxgl.nwaterlooxa/Frcject/Vanislvwebqadjy_ laitnil*^ apparently 
26 a vailable at least fiom December 11, 1S96, tlie disclosure of which is incorporated herein by 
reference, describes tiie "webqi-iery" project, "in which a qudity of a site that turns up in a 
search is evaluated based on the number of sites linked to the site and the number of sites links 
in the site. 

SUMMARY OF THE LNVENTION 

30 An object of some embodiments of the iavenlion is finding one or mere hub sites; or 

lists of WWW pages that cover a topic presented by a set of input sites. In an embodiment of 
ilie invention, the hubs or page lists are selected by virtue of their including links to a 
significant number of the sites in the set of the input sites. An expected advantage of using 
hubs is that each hub may concentrate in h a large number of links to relevant sites, beyond 
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those provided in tlie input set, and also include additional information which can help a 
human usor select oertam sites for browsing. 

An aspect of some emhodiinents of the invention relates to selecting a potential hub 
based on a statistical analysis of an rnLemet link structure, for example, using an 
5 approxmiation of a number of links from the potential hub to a set of input sites, rather than 
delijrmining which sites from the input set are acnially pointed to. In one embodiment of the 
invention, this determination i$ made by searching for potential hubs that include links to 
groups of input sites and then ranking the resulting potential hubs, based on the number of 
groups pointed to by each potential hub. As a potential hub might include links tg jnore than 

10 ojie site in an input g^roup, the approximation may be significantly difTerent fix)m the actual 
number of links between a potential hubs and individual membta* sites of input groups. It is 
noted that in some embodiments, there is no final determination of which particular site is 
pointed to by the potential hub. 

An aspect of some embodiments of the invention relates to a method of automatically 

1 5 determining a hub-potential of a site, for example for ranking hubs tn a set of potential hubs or 
tor finding potential hubs in a search. In one embodiment of the invention, a hub potential of a 
site is determined based on structural properties of the site, for example^ the existence of a list 
of links and/or the existence of a text paragraph (e.g., a review or description) of many of the 
links. Optionally^ tlie namber of links is determined by counting the occurrence of the phrases 

20 which indicate tlie presence of links, such as "http:" or "href. Alternatively or additionally, a 
hub-potential may be determined based on the usage of key terms of the topic in the site in 
general and/or in anchor portions of the site in parcicuJar, such as a main title or a section 
heading. Alternatively or additionally, a hub-potential may be determined based on a usage of 
hub-t>'pical words or phrases, such as "list of links", "links", "index", "list'*, ''compilation" 

25 and/'or 'Vesources". Optionally, these words or phrases receive a higher scoring based on their 
location in the site, for example in a title or before a long list of links. 

In one embodiment of the invention, tlie potential hubs are ranked and/or filtered 
before being analyzed in greater depfli. Alternatively or additionally, the hub generation 
process may create a small set of potential hubs to begin with, for example using a threshold 

30 setting. Such ranking may include, for example, selecting only a subset of those sites that point 
to the input set of sites, for example based on the existence of a topic word in those sites, prior 
to analyzing die sites for hub-potential. In another exaanple, potential hubs that arc found usbig 
a search engine are required to both include a topic word and at least one link to one group of 
sites Srom the input site. 

2 




:&/a6 M)0 THU i.7:32 FAX 972 Z 9215.383 FENSTER & CO 12i020 

197/01184 

In an embodiment of the invention, huh potential is characterized by rules, which may 
be phrased in a search engine command language, so a seajcli for the hubs using the search 
engine returns sites witli a higher potential of being desired hubs. In an embodiment of the 
invention, the particular features of a search engine, for example, searching fer URLs or links, 
5 disjunctive searcli and/br pipes, are used to perfomn one or more of the above activities, for 
example, ^roup comparison, rule application and/or thresholding of potential hubs, more 
efficiently. 

In one embodiment of the invention, an input set of sites is generated by a user 
providing a topic or topic words and generating, for example by one or more search eugiiie(s) 

10 and/or hitemet indexes, a list of sites relevant to that topic. Optionally, the list of sites is 
tilteied prior to being used as a basis for fmding hubs, for example by removing redundant 
atid/'or mirroring sites. 

Alternatively or atlditionaDyj an input set of sites is generated fi-oni a user provided 
site. The user provided site can be analyzed to find a second set of sites that is similar to the 

16 presided site. One exemplaiy method of determining similarity is by finding hubs as defined 
above wtiicb pohit to the aite and selecting links from those hubs as similar sites. Another 
exemplary method is to receive a short list of examples for such similar sites. Another 
exemplary method is finding sites that contain similar text to the provided site. Optionally, the 
user provides a set of sites, rather than a single site. 

20 Optionally, hubs that pomt to the similar sites and not to the provided sites are 

determined. In some embodimeriis, these hubs are treated as hubs to which a link to the 
.piovided site should be added, ibr example by suggestion to the hub operators. 

Alternatively or additionally, an input set of sites is generated by analyzing a user 
provided hub or a hub obtained from previous use of hub-finder or a hub constructed by 

25 ccmbimng search rcsults/analysi s of existing hubs tx™ otlier user pm vided in foTmation. 

Alternatively or additionally to providing a hub as an input, a list of a user's favorite 
bookmarks or recently or frequently traveled sites may be used as an mput instead. Such lists 
may be considered to comprise a profile of a user, I'br example for advertisement targeting or 
for finding friends or paitaers. Such a user profiUng tool can be used, in some embodiments of 

30 the invention, to extrapolate from an existing, studied group of users to a large group which is 
not studied in detail but whose browsing habits arc known, 

A set of sites may be filtered, manually or automatically, prior to being used as an 
inpul set, for example, a user manually selecting a subset of links or a topic word for use in 
analyzing tlie suitability of the links. 

3 




00 TKV 17;.^2 FAX 972 3 92l5.'!53 FENSTER & CO ©OS.! 



197/01184 




Optionally, tlie resulting hubs are considered a set of hubs winch aie similfur ihe 
ir.put hub 01 at least an aspect of the input hub, and may thus be presented to a user 

Tn one embc dimem of the invention, a set of siinilaf hubs is analysed, to harvest 
ijifcmiation which may be useful, for example lo the owner of the provided hub. In oTie 
5 example, the liriks of the sitnilar hubs are collated, filtered and/cr ranked, to detect links or 
textual descriptive material of links that are missing from the input hub aiid might be desirable. 
h] anoiher example, links that exist in the provided hub are ranked based on the particulars of 
the appearance of ffuch links in the similar hubs. In another example, a new hub is created, 
possibly ad -hoc, based on the analysi^ed similar hubs. 

1^ The simlax hubs that are found may be real hubs seairhed for in the Internet. 

.Alternatively to finding Iniemet hubs, interest hubs of usei$ may be dctcnnined. A database of 
user's browsing habits or fevorite Imks may be considered as hubs, one for each user. The 
search for hubs then conqsrises searching in this database for users, whose inteiest hubs are 
relevant to a provided set of htput hubs. The expansion of sites into hubs may be performed on 

1 6 the Internet, in which case the found hubs reflect the common association of links. These hubs 
may be used to find links that exist in the database of user habits. AJtematively also tiie 
expansion of sites into hubs is perfoimed in the user habits database^ m which case the found 
hubs reiltict the preferences of the particular users in the database. A similarit)' between uj;ct 
browsing haiiits (or favorite links) and hub sites, which may be noted, is thai both are lists of 

20 links that are orgtmized by a thinking being io reflect a particular thought, topic or personal iiy. 

An aspect of some embodhnents of the inven^tion relates to a method of presenting a 
list of hub sites. Altematively or additionally, to providing as a list of sites, the sites may be 
provided along with auxihary information, for example, inlbrmafion about hnk structure, such 
as numbei of links, number of unique links (not in other piges), number of popular links (on at 

25 least k pages), amount of explanation for each link, method of ordering of links m the page 
(alphabetic, topical, regional, ranked, etc.), information copied from the target pages, such as 
xhe links themselvtis andybr explanations about the links. Copied information may be collated, 
for example, by target Ihik (or equivaltnit links), or grouped according to other criteria, such as 
length, alphabetic^ topic, rank, region and/or repetition. 

30 There is tlms provided in accordance with an exemplary embodiment of the invention, 

a method of tlnding WWW page.s, each of which includes at least one list of links to desued 
Intemet resources, comprising: 
providing a hst of UTULs; 
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automalically generating at least one queiy for an Memet search tool for WWU^ pages 
that include links to at least one URL of said list of URLs; 

executing said at least one generated query to provide soarch results that include at 
least one of said searched for WWW pages; and 
5 generating a response comprising at least one indication of one of said WWW^ pages, 

responsive to said search results. Optionally, the method comprises displaying said response to 
a user. Alternatively or additionally, said at least one URL comprises a plurality of URLs. 
Aliernatively or additionally, said response is generated using a single search step and no 
iterations. Alternatively or additionally, said method corr^riscs ranking sai.d search results. 
10 Optionally, ranking of a WWW page is responsive to a number of groups of URLs pointed to 
by said W^TW page. 

In an exemplary embodiment of the invention, said generating at least one search 
query, comprises: 

dividing said list of URLs into a plurality of groups and generating at least a single 
1 5 query for each group, wherein said at least a single query does not differentiate which URL in 
said group is pointed to by the results of the search, 

wherein said executing comprises executing said generated at least one query for a 
pluralit>^ of .said groups, generating a plurality of result hsts. Optionally, all of said groups 
have a same number of menibers Alternatively, at least three of said groups have a diffcreat 
20 number of inembers fiona each other. 

In an sxemplery' embodiment of ihe ir.vention, the method comprises collating said 
result lists into a single hst of search results. Optionally, the method comprises ranMng the 
contents of at least one of said result lists. Optionally, said collating is responsive to said 
raiildjig of said at least one of said result lists. Alternatively or additionally, said ranking is 
25 applied to said resuit list after it is generated. Optionally, the method comprises filtering said 
at least one result list responsive to said ranking. 

In an exemplary embod\inent of the invention, said ranking is ^plied to said result list 
during said execution. Optionally, said raiMng is applied by adding at least one limitation to 
said at least one generated search query. 
30 In an exemplary embodiment of the invcntionj said rBnking comprises ranking 

responsive to a number of said URLs pointed to by said result list. Alternatively or 
additionally, said ranking comprises ranking responsive to a morphological property of pages 
of said at least one result list. Optionally, said morphological property conq)rises Uie existence 
of a link list. 
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In an exemplary embodiment of the invention, said ranking indicates a probability of a 
ranked page being a hub. Alternatively or additionally, said ranking comprises ranking 
responsive to the presence of at least erne key word in pages of said at least one result list. 
Optionally, said key word comprises a word that h related to a content of said list of URLs. 
5 Altematively or additionally, said key word comprises a word that serves as a statistical 
indicator that the page is a hub. Optionally, said key word is selected from the group ^links'*, 
''index" and *Yesomx;e**. 

In ari exeiiiplary embodiment of ihe iiivcntion, said pioviding comprises a user 
providiTig a hst of URLs. Optionally, said user provided list of UIU..S comprises at least a part 
10 of a URL bookmark file. 

in an exemplary embodiment of the invention, a method according to claim 1, wherein 
said providing comprises a user providing a WWW pa^e including a list of URLs. 
Alternatively or additionally, said providuig comprir^s: 

a user providing one or more topic words; and 
1 5 executing a preliminary search to find a list of URLs related to said one or more topic 

words. Alternatively or additionally^ said providing comprises: 

a user pmviding a WWW page; and 

executing a pre liminar y search to find a list of URLs that point to pages similar to the 
provided WWW page. Optionally, said executing said at least one generated query comprises 
20 sxecuting said at least one query to ignore WWW pages that include links to said user 
provided URl.. 

In an exemplary embodiment of the invention, the method comprises fihenng said 
searcli lefcults before said generating. Alternatively or additionally, said search tool comprises 
a search engine. Optionally, said executing said at least one quer>' comprises executing using a 
25 pipe feature of said seaioh ^gine to limit a second search step to a list of sites found in a first 
search step using said search engine. 

In an exemplary embodiment of the invention, said response comprises a list of said 
WWW pages. Optionally, said response includes link statistics for said WWW pages. 
Optiuually, said hnk statistics mclude a number of links in each WWW page. Alternatively or 
30 additionally, said link statistics include an indicator of a uniqueness of links in each WWW 
page. Alternatively oi addilionelly, said link statistics include an indicator of an amount of 
infbi-matioa associatijd with links in each WWW page. 

In an exemplary embodiment of the invention, said response comprises a list of lijiks 
hsted in at least one of said WWW pages. Optionally, said response comprises a list of links 
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listed in at least a given number of said WWW^ pages. Optionally, said given nuinber is greater 
dm 1. Altenmti\'eiy, said given number is greater tban 2. 

In an exemplary embodiment of the invetiiion, said list is arranged by WW^J pages. 
Aitematively or additionally, said list comprises information associated with a link in its 
5 coiTesponding WWW page. Aitematively or additionally, said list indicates pages not 
including a Iwk to any URL in a predetermined list of URLs. AltoBativcly or additionally, 
said lis: indicates pages not irtcluding a link fiom the contents of any URI. m a predetermined 
list of URLs Optionally, said predetermined list ii p rovided by a user. 

There is also pro\ided in accordauce with an exemplaiy embodimem of the inventioii, 
10 a method of finding WWW pages, each of which includes at least one list of Unks to desired 
Internet resources, comprising: 

providing at least one URL; 

generating a list of URLs related to said at least on© URL; 

determining at least one W\W page that ingludes links to at least one URL of said list 

1 5 of URLs but not to said ptovided at least one URL; eind 

generaling a response comprising at least one indication of one of said at least one 
WWW page. Optionally, the method comprises displaying said re^onsa to a user. 
Alternatively or additionally, said at least OTie WWW page comprises a plurality of WWW 
pages„ Optionally, said providing comprises providing a WVTW page including having a link 

20 to said at least one URL. 

In an exemplary embodunent of the invention, said providing comprises providing a 
list of a plurality of UKL&. Alternatively or additionaiiy, g^erating a list of related URLs, 
comprises generating a list of compeution URLs. Alternatively or additioually, ^^ensrating :i 
libt of related LTULs, comprises generating a list of similar TJRLs. AJtemativeJy or additionally, 
generating a list of related URLs, comprises finding WWW pages characterized in that a 
common WW^ page includes links to at least one of said WWW pages and at least one of 
said at least one URL, Alternatively or additionally, said detenniiiing comprises executmg a 
query on a search engine. 

BRIEF DESCRIPTION OF THE DRAWINGS 

30 Parriciilar embodimcnrs of the invention mil be described with reference to the 

follow ing description of some eriibodmients of the invention in conjunction with the figiu-es, 
wherein identical stractures, elements or parts which appear in more than one figme are 
optionally labeled witli a same or simiuu- number m all the figures m which tliey appear, iu 
which: 
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Fig. 1 is a schematic illustration of a configuration of a search engine ia accordaiice 
wilh an exemplary embodiment of the ittvention; 

Fig. 2 is a flowchart of a method for finding hubs, in accordance with an exemplary 
embodiment of the iavcation; and . 
5 Fig. 3 is a flowchart of a method of finding sites similar lo a provided list of sites, in 

accordance with an exemplary embodiment of the in vention, 

DETAILED DESCRIPTION OF SOME EMEODIMENTS 

GENERAL 

fig, 1 IS a schematic illustration of a coirfiguration 100 of a search engine 106 in' 
10 accordance with an exemplary embodiment of the invention. A user 102 uses search engiiie 
106 for finding sites of interest on an Internet 104. The connection to search engine 106 is 
typically also through Internet 104, but is not required. Typically, searcli engine 106 utilizes a 
database 108 that contains indexes and other infonnation relating to WWW pages known to 
search engine 106, In a typical search engine, a user provides terms and the engine responds 
15 with a list of sites ihat include some of the terms. Some, more advanced search engines also 
provide sites that appear to be related for various reasons. In some embodiments of die 
inYcntioD, a directory including an index of which sites link to which other sites is used as a 
search tool. 

A search engine result analyzer 1 10 is optioiially provided, to analyze the results of tlie 
20 search of index lOS by search engine 106 and to provide analysed results to user 102. 
Optionally, as will be described below, analyzer 110 also executes pardctdar searches on 
search engine 106. Althougli result anal^^zer 110 is optionally configured lo work best with a 
particular search engine 106, a same analyzer can work with a pluraUry of search engines. 

An analysis of search results is (generally desired as search engines do not typically 
25 provide a single or small nmnber or exactly matcliing sites* rather, based on keywords or 
subject fields, a large number of sites that might be suitable are provided Wading througli a 
long list of sites is extremely time consuming. One reason for tliis required wading is the lack 
of sLiitable softv^^are for determining if a particular site is really relevant to user 102. Also, 
valuable sites are often missed. Even indexing sites, such as Yalioo!, which use human 
30 indcxers, often do not supply a suitable site» for several reasons^ including, (a) not being up to 
daic; (b) lack of coverage over much of the Internet; (c) lack of suitable manpower and/or time 
for such manpower to cover all tiie myriad subjects on the Internet; and (d) lack of a suitable 
indtix structui'e. 

I ypical reasons that a user browses the Internet for information include: 

8 
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(a) saarcliing for au answer to a pardciilar question, optionally answered by aii 
authority on that question; 

(b) lookiag for an overview of a particular field; and 

(c) searcliiiig for a set of sites, from which the user can derive his or hei' own 
5 conclusions. 

The inventors of Ihe present invention have realised that in many Selds there are 
ini;erssted people who have conipiled then: own listing of relevant sites and an analysis of each 
relevant site; such listing sites are known as hubs. Thus, it is generally useM to provide a user 
with a short hst of such hubs. The inventors have also realized that reviewing such hubs by a 

10 user may be a better way for the user to find a dq>endable and knowledgeable authority in a 
field, thaa by merely relying on an automated pix)grain that analyses luiks between sites. 
Neglecting a search for poiennal authorities, in accordance with some embodiments of the 
invention can allow a faster method to be used for finding hubs. Once these hubs are 
detennined, there are other^ further types of analysis that can be usefully presented to a user 

1 5 and answer oHier information gathering questions the user might have. 

Following are several methods of analyzing search results to assist an interested user 
102 in finding one or a small number of relevant sites or hub sites. Althoixgh not expUcitiy 
described ir. each of the below-described methods, additional filtering steps for rejecting 
certain sites a$ being unsuitable may be provided. Also it is noted that a block portion of one 

20 method may be suitable for inclusion, as is, in another method^ as is also described in the 
exemplary implementation method, described herein. 

A user may have a particular question to which he desires an answer However, the 
words used in the question often do not match the words used in the field, or in the particular 
site that holds the answer to the user^s question, in some cases^ there is no common way of 

25 describing the subject of the question. Each hub can be considered, among other things, to be a 
dictionary of synonytns. Once a user finds one hub, the common usage of nanes to describe 
the subject of the question, generdlly becomes clear. 
FINDING UUBS 

Fig, 2 is a flowchart 120 of a method for findmg hubs, in accordance with an 
30 exemplary embodiment ofthe invention. 

First, a keyword search (122) is performed. Alternatively, other ways of localing a 
plurality of sites related to the subject matter, may be used, for example, the listing of links in 
an existing hub may be used. Other method of providing link lists are described below. 
Optionally, the number of sites returned is lunited, for example to 50. 

9 
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An optitjnai filtering step 124 may be paformed to remove sites that are clearly 
unsuitable, for example irmltiplc results in a same domain. Other possible filtering rules 
include: reinoviiig internal sites of oiher search engines, removing sites with low laJings or 
based on ti?.P sii« size or creation/revision date. After filtering, there are N sites, where N may 

5 be a result of the fi Stsring or the filtering may be ad^ted to achieve a desired value for N. 

The tiliered search results are then grouped into groups (1 26), of size K, for example 
K=4. In some embodiments of the invention, K is a function of Alternatively or 
additionally, K is a function of the results in the group, for example, one group may have a 
small nimiber of high ranking results while another group has a large member of low-rankuig 

10 results. It is not, however, required that all groups be the same size. Various grouping methods 
mny be ust:d, for example* randomly selecting sites, based on order in search results (cither 
selecting blocks of sites, or selectiag e^-enly or non-evenly spaced sites from the search 
results), based on ii tanking mefliod, to create groups with balanced ranks or search order (e.g., 
two high and t\)V'o low ranks) and/or grouping similar sites together. Optionally, the size of the 

15 group may be inversely related to the ranking of si tes in the groups. 

A plurality of potential hubs are determined in step 128, by searching for sites that 
include links to any site in one of the groups. Only N/K searches are required, hi an exemplary 
feearch engine 106, for each group tlie search is for sites that include reference to or link to the 
htcp address of at least one of tlie sites in the group, e.g. the search term being: 
"www.sitelxom OR ww^v,$i te2.com OR wwvv.sitc3.com/stuff OR www.site4.com"* 
Optionally, this search includes a request tor ranking of search resuhs by the search engine. 

The results of all the searches are collated and then optionally ranked (130). In an 
exeniplajy ranking scheme, a two digit number is used, Ihe tens being the number of searches 
the site came up on and the ones being the existence and number of special keywords that 

25 appear in the potential huh (to be described below). A four digit scheme may also be used. 
Also, different ranking methods or different wei^ts for the different factoi-s may be used, 
Zxemplary special keyv^-^ords are words that mdicate that a site is more likely to be a hub 
(described below) or words jiom the subject topic or irom the original search. In some cases, 
such topic words can be gleaned from the original search results (122). for example irom the 

30 page topics or provided by a user. 

In a step 132, a small numbsr of hubs are selected for ftinhcr consideration, for 
example based on tlie rankhig. 

In an optional step 134, the selected hubs are fiitoed to remove sites thai are not 
desii able, for example based on an analysis of their content. Exemplar>' analysis rules that can 
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be at>pli fc3d are: counting the number of litiks from the site; counting the number of links which 
also appear on other potential hubs; eliminating potential hubs which are almost identical to 
otlier potential hubs. Typically, but not nccesBarily, a larger nrnnber of links indicates a more 
desirable hub. If however, the number of links is too high, this may indicate an omnibus hub 

5 xhiiX may be too dif&cuit to use, if it is Tiot organized. A later optional step of ajialyzing the 
ajnotmt of content associated mth each link and/or the organization of llie hvks may be used 
to deterajme if such an omnibus hub is suitable for the user. 

The filtered hubs are then presented to user li02 (136). In an exemplary embodiment of 
the uwetitton, the filtered habs are presented as a list of links. Alternatively oi additionally, the 

10 sites may be provided along with auxiliary mformation, for example, inlcrmation about lii)k 
stnictore, such as number of links, number of unique lirjks (not in other pages), number of 
popular links (on at least k pages or top q pages), amount of explanation for each hnk, method 
of ordering of links ha the page (alphabetic, topical, regional, ranked , etc.), information copied 
iiom the target pages, such as the links themselves and/or explanations about the links, 

1 3 Alternatively or additionally, the list may indicate or be separated into pages that mclude many 
(ox any) links to the user provided XJRI.Cs) and those pages that do not. 

It is noted that in. some embodiments of tliet invention hubs are found without finding 
authorities at the same time. A potential advantage is obviating a need for an iterative process 
to identify the hubs, ha an exemplary embodiment of the invention, the step of determining an 

20 aalhority is performed manually by a user browsmg through a short hst of found hubs, to see 
which link in the hubs is suggested, by the hub, or by its contents, as a suitable authority. It is 
expected that in many cases, this method of finding an authority will yield better results tlian 
automatic determination of authorities, as many hubs are composed by experts and contain 
many hints that an automated method might not grasp, while a human user will. Tn this context 

25 it should be noted that many sites include Ibiks to other sites, for many reasons, which may 
have nothing to do with searching for information. However, when a hxib includes a list of 
Imks, the list itself is often put logetlier with some thought or logic. 

In an exemplary embodiment of the invention, after relevant hubs for a field of interest 
are found, these hubs may be analysed to determine whether or not a particular target site is 

30 pointed to, by the found hubs. This deteimination may be used, for example, for updating the 
hubs by the hub owner or a party offering a service of hub npdatmg and supplementing. 

Alternatively or additionally, the hubs are analyzed to detect and/or utilize 
inconsistencies between a search engine uidex and the actual state of WWW sites in the world. 
In one cxamplcj a new WW^^ site can be detected by finding it on a hub. In another example, 

11 
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a search result listing can be analyzed to detect relevant or virtual hubs and then the links in 
the hubs compared to the search resuhs so as to provide to a user Vith a list limited to new 

siies. 

Alternatively or additionally to using a keyword search as an input for a hub finding 

f) method, the input list of sites may be a listing of possible competitors. Such a list may be 
generated, for example, using a *'find similar sites'' fcatare common in many search engines, as 
applied to a site of interest, or provided by a user or found automatically by analyzing 
collections of sites. Optionally, the site of interest is indicated, in the search C}uerJ^ as desirable 
not to be found in tLe resulting hubs. This allows the finding of hubs daat should contain a link 

1 0 to the site of interest but do not. 

In a competition scenario, the most relevant hubs may be characterized, at least in part, 
by tlie number of competition sites pointed to by the potential hubs. 

One method of finding such hubs is to search for hubs that include links to any one of 
the first N (e.g„ 10) competitor sites. The search can be performed for example as shown in 

16 Fig. 2 at 128, in which groups of sites are search together or potetitial hubs may be found for 
each individual. The results can filtered, for example as described above. The Uiks in the 
found potential hubs can be collated and ranked. Optionally, if the provided or determined 
competitor list is too shoit (e.g-, bcloTv a threshold length or not providing reasonable results, 
for example if the nimiber of total links or hubs to the competitors as a group and/or as 

20 individuals is greater than a threshold value), it may be augmented, for example with other 
relevant sites, such as sites that include topic words found in the competition sites or provided 
by the user. 

f INB MATCHING SET 

Fig- Ji is a flowchart 160 of a method of finding hubs similar to a provided list of sites 
25 and/or an existing hab^ in accordance with an exemplaiy mibodiraent of the invention. The 
similarity is embodied in a similarity of the content, type and/or other characteristics of Hnks 
&om and/or to the sites. In some cases, sites tbat are similar to individual ones of tlie provided 
sites are sought. In other cases, a found site is similar to a combination of the provided 
example sites. This method is meful, for example, for finding a group of sites that may be of 
30 interest to a particular user 

In a step 162, a hst of links is provided. This list may be provided m many ways, for 
example being gleaned from a provided set of sites to which similar sites are sought In 
another example, this list may comprise a list of ''favorites" or bookmarks or a user,, of the list 
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of links in such a list of favorites. Iii another example, tlie litiks arc copied from (he litik list of 
a pai-licular hub, 

hi a step 164, die uieihod of Fig. 2 is desirably applied to find potential hubs for the 
lin>i*5. Other methods may be used as well. 
6 In an optional filtering step 166, some of the potentia) hubs arc filtered out. 

The resulting potential hubs may indicate other users whose interests are similar to 
those of ?J user who provided a "favorites" list* 
VARJATIONS 

the above description, several filtration methods are described. An additional 
10 filtration method of hubs ox sites can be based on the presence or lack of presence of topic 
words. Even if a user does n^t provide such topic words to begin with, these words may be 
aatomatically gleaned, for example from title or summary sections of relevant pages or by 
analyzing the text of URLs. It is noted that some search engines can be controlled to search for 
common words only in "smnmary" parts of the page or in anchor portions of the page (near 
16 links). 

Another filtration method takes into account the pre>sf^nce of liiiks that are essentially 
garbage links, such as promotions or advertising. These links may be repetitive or only a part 
of the link, for example the domain name is repeated. Optionally, a database of such links and 
their fields is maintamed, so that if these links arc actually of interest ttiis fact can be 
20 d?leniiined by tlie field of search matching that m the database. 

Another filtration method analyzes a site based on its hub-likeness. Such an analysis 
may be based on the number and organixation of links, existence cf special sections titled, for 
example, ''additional links'' and/or the use of words common m hubs, such as "links% '*kidex", 
iiiid "resource". 

25 A typical reason for a user searching for a hub (usually a plomlity of hubs) is as a 

starting point for searching on a subject X. Not ail hubs arc equally suitable. Deskably, a 
"best" hub will optionally meet as many as possible of the following criieria: 

(a) There are many Imks to relevant sites. 

(I)) The Ihiks are divided into meaningful categories. 
30 (c) Each link is followed by some explanation conceming the site it points to. 

These criteria can be checked using methods described herein, to rank hubs. In 
paiticular, the division of the links into categories can be determined by clustering methods. 
For example, taking each group of links and finding whether there arc many or few hubs that 
include many of the links in the group. 

13 
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As noted above, a search may be limited to a search iii links of potential hubs or 
selected (by a user or automaticaUy) ones of tlie hubs. Some search engines, sach as 
^'infoseek" inclride such a tooL Alternatively, a search can be limited by requiring the preseace 
of selected ones of the links. rypicaUy, tlie length of the search clause is Uraited, so that it may 

5 need to be repeated several times, each time with other of the relevant links. Alternatively or 
additionally, searcliing may be performed using a smart agept or through a web site (or 
software tool) dedicated to tlie application of the present inveatioa. 

In some embodiments, it is desirable to ranlc the results based on the number of links to 
a site in the results. Some search engines ptx^vide such a result. In other search engines, tlie 

10 number of displayed resuUs can be limited to zero or one, so only the total count needs to be 
looked at. Aitemalively, a connection to the search engine is broken as soon as the number of 
results is provided. 

As roted above, the searching for potential hubs is optionally statistical, m that groups 
of sites are treated as single units, However, in some cases it is usefiil to treat at least one hub 
1 S on an individual basis, for example if the hub is deemed to have been updated lately or based 
on its rank. Such a hub may be retrieved and/or the hub may be compared to each of the links, 
to see which links the hub actually inclxides. 
APPLICATION OF THE ABOVE METHODS 

The above methods may be applied in many ways, only exemplary ones of which are 
20 described belov/. 

In one exemplar^' application, a service is provided to find hubs to which a WWW site 
should belong. Money can be charged based on clicks to the site and'or purchases at the site 
following travel through the links. The above analysis methods can be used, for example, to 
suggest to the site and'^or to the hubs the suitability of listing the site, A service provider may 
25 sign conti'acts with the hubs to list sites at the service provider's request The service provider 
can also provide an indexing service of pointing to the hubs of interest in a particular field. 
Unlike current Intemet indexes that are all ceatiiiUxed, the service provider provides a 
distributed index, of which the service provider may not own any part, but optionally controls 
the existence of at least a limited number of relevant sites for the field that the index covers, in 
30 the particular slant (role) of that distributed index site. Optionally, but not necessarily at least 
part of each distributed index part is arranged in a standardized format. 

In an exemplary embodiment of the invention, the service provider can contact the 
hubs, to determine ttiat additional site links are desired, and/or the sites, to determine that 
additional listings in hubs are desired. Once either an interested site owner or hub owner are 
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deteoTiined, a partner (hub or site) for completing the listing transaction can usually be found, 

Allematively, competing hubs may be set up by (he service provider. 

In one exemplary implementation, the service providecr provides the list cf sites as a file 

including also prouiotional material, so that the site owner will copy the litik list Vidth the 
5 proinotiona] material. Alternatively or additionally, the links provided are not links directly to 

the targets, instead ihe link passes tlirough a serv'er controlled by the service provider, for 

example to track access for charging purposes, which server can, for example, add promotional 

material aiid/'or assure that the site links are up-to-daie. 

It is noted Hiat one or more of the tasks of mappmg sites, personalization of search 
10 engines, ranking of relevant sites and/or alertitig to new sites may be provided iti a 

significantly more cfiicient manner than known in the art using the methods described above, 

10 some embodimctits of the invention. 

As part of competitive hubs or as a service at participating hubs, the service provider 

can update the site listing periodically, to reflect the changes in the hiteniet. Alternatively, a 
1 5 virtual hub (based on the method of Fig, 2, for example), may be generated ad hoc, at a user*s 

request The listing may he generated in real-time or it may be updated periodically, for 

example once a week. 

SPECIFIC ALGORITHMS 

In a particular implementation of the invention, wliich includes some of the above 
20 described methods, the following algorithms are implemented. Coiiunenls provided for a 

particular mettiod step are not repeated for all the methods* First described are component 

algorithms. Then, composite algoritlmis tliat build on the component algorithms are described. 

Finding potential hubs 
25 This algorithm corresponds generally to tine method of Fig. 2, 

Najne. : Centers(T) 

Input : a set, of one or more target m is. 
Outpat : a set of htibs thai link to many urls in T. 

30 

Algorithm: 

1 . r ^ filter(r) < delete blacklist, long uri The source set T is filtered, for 
example removing long URLs and URLs on a black list, 
35 2 . Partition T ' into seta o{s\ze K: ti^^ J^^ using partitionmetkod, Tlie set is divided 

into small groups of URLs. 

3 . Deime Link-to</i) ^ all non internal links lo i[ . 

4. For each i, compute Liuk-to'(/i) 

15 
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i. Choose searckengine/settings 

ii . Submit query: ]kk\ti] , ^ , lmk:/ix i topic "links" *'iudex'^ ''resources" 
Hi , Tercninate query afler limeout 

5. RawScore(c) ^ K t- \ c^Link-toX^i) . i=l»...,n}| This is amcasurc of whether the. 

5 site acts like a hub (numbei of lixiks) 

6 . TLticScore(c): This is a mcasizre of whetlier the site looks like a hub. 
I {''links" ''index'* 'Resources''} = +3 

ii. oac topic word = +3; two « three - +6 

7. Score(c) = lO*RawScoTe + TitleScore(c) 

10 8 . CentsrsX?) = [c \ Score(c; > scorethreshald, Score(c) in top rankiJireshold} 
9. Cettters(7) = Filte?(Centers'(r)) < for each femily keep hi^est score, delete 
dated >. Various hubs are removed, for example old kabs and mirror sites of other 
hubs. 

1 5 Note: for T = single url: skip 1 , 2, 4iin RawScore 1 

Farameteis: K, partitlomttethod, seaTchengiiie/settings, timeout, scorcthreshold,raiikthreshold. 

Finding potential hubs missing a link 

This aigorithm is one of the variations described with re^ference to Fig- 2, for finding hubs tliat 
20 do not point: to a particuiiir site tliat belongs to a topic of the hubs. 



25 



Name : CentcrsAbsml(T, url) 

Input : a set, T, of one or more target urls; url 

Outfrat : a set of hubs which link to many urls in T but don't link to url. 

Aigoritbnii 



1 , filtsr(7) < delete blacklist, long uii > 

2. Partition rmto sets of size A: ifj^ using ;?artifio7im^rAorf 

30 3. Define Link-tolX) = all non intenial links to t[ . 

4 . For each i , compute Link-to-absent(ri) 

i. Choose searchengine/settings 

ii. Submit query: -[ui^:url hnkitu ^ , link:/iK I topic "links" '"index" 
"resources" (Topic can be a parameter or it maybe determined Ironi t!ie 

3S search results) 

iii . Terminate query after timemt 

5. RawScore(c) ^ i{ ti | ceLiiik-to-ab3cnt(ti) , i=l,. 

6. TitleScore(c): 

{♦links'' '*mdex" "resources''} ^ +3 
40 ii. cxne topic word = -(-3; two --5; three = -r6 

7. ScoreCc) = 1 0*RawScore + TitleScore(c) 

8. Centers'(7) « {c \ Score{^> > scorethreshold, Score(c:) in top rcnkthreshold) 

9. CcMiexsCZ) Filter(Ccnters'(r)) < delete dated > 

45 Parameters: iC^partitionmethod, r.earchmgine/seUings, timeout, scorethreshpld, rankthreahold 
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Finding sites relevant to a topic 

This algorithm allou's a user to selectively provdde a, list o fsites or a topic, as an input into Hie 
ottier methods. 

5 Narne : Rekva]it(topic) 
Input : topiu 
Output : sites about topic 

1 . Get top ki^ sixes on searchengine 
10 2, Filter <remove duplicates Jimk> 

Pai'ameters: searchengine, 
fuiding hubs about a topic 

This algorithm is one impkmtJiitation of tti5 method of Fig. 2. 

15 

Ixxpxix: topic 

Output: Hubs on topic 

20 Algorithm: 

1. Ccmpuic Releva]ot(/opzc) 

2. Compute Ceaters(^elevautOo/?/c, Sisarchengim)) 



Find similar hubs 

This algoritlim identifies hubs that aiia similar to a knou-n hub, relate to a same subject field 
and thus arc useful as a ataiiiag point for searching. 



25 
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Name : SimilarHubs (for single hub c) 
Itiput : hub c, topic 
Output : Similar hubs 

5 AlgorithTin: 

1 . Compute Targct5(c) - external Htiks in c 

2. Tf|Tai-ets(c)| -20-j 

1 c i . Get Rclevaiit(ro/?/c) <top j> 

ii. TargctsXc) = Tairgets(c) \j Re\evmX(toflic) 

3. If !Target$(c)| - k > 200. A paramster of the method. 

I Targets'(c) = |Targets(c:)|,^loor(!TargeLs(c)l/lGQ) of Target5(c) <Sclcct periodicany> 

4. Else TargetB'(c) ^ Targets(c) 
If; 5. Compute Centers (Target$'(c)) 

Find potential hvb$ that should list a she 

This aJgorithm identifies hubs that do not point to a particiihr site (Fig, 2). 



20 Name : Place Target 
Iirput : url topic 
Output : hub sites not linked to urt 

Algoiitlim: 
25 1 . Get Competitors(2/r/) 

2. For each i € Cornpetitors(«r/) compute Centers AbserLt(/, urJ) 
3. 

4. If |CentersAbscnt(r, iirJ)\ < 20 
Compute Reievant(tO/3zc) 
30 5. Compute CentersAbsent(Relevan?(?o/?ic) , urt) 

6. PlaceLnl]<(wr/) « 

[CentersAbseiit(f, urt) \j CentersAb5OTt(Relcvant(rc)p/c) , - Siiiiilaf(Link-to(MrO) 

7. If c € PlaceLink(«f/) but c ^ -CentersAbsent(r, urC) 

Score(c) = 10 RawScore(c) (in Center5Absent(Relevant(^o/?Jc) , url)) 

35 

PHYSICAL LMPLEiMEM ATION 

Search result analyzer 110 may be implemented in various ways, optionally without 
liiniiing its ability to provide the services described above. 

In one example, search analyzer 110 is integrated with search engine 10, possibly in a 
40 same computer or in a LAN tlicrcof. 

Id anoLhcr example, search analyzer 1 10 is a separate WWW servtir that contacts search 
eiigiii© 106 via the Internet or directly. 
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In another ejuuiiple, search analyzer UO is, at least in part, a client software eKecuting 
on user 102. Tliis client software may be permanent or it may be a network programmed, e.g. 
Java, applet that is downloaded by user 102 at need. 

It will be appreciated that the above described methods of hub and site finding may be 
5 V aried in many ways^ including, changing the order of steps* which steps are performed on-line 
or off-line, such as table or index preparation, and the exact implementation tised, which can 
include varioufi hardware and software combinations. In additiow, a multiplicity of various 
featiues lias been described. It should be appreciated that different features may be combined 
ill different ways. In particular, not all the features are necessary in every preferred 
10 embodiment of the invention. Software as described herein is preferably provided on a 
computer readable media, such as a diskette or an optical disk. Alternatively or additionally, it 
may be stored on a computer, for example in a main memory or on a hard disk, both of which 
are also computer readable rnedia. Whcte methods have been described, also computer 
hardware programmed to perform the methods is within the scope of the description. When 
16 used in the following clairiis, the terms ''comprises", "includes", "have" and ttieir conjugates 
mean "including but not limited to'*. 

It w^U be appreciated by a person skilled in the art that the present invention is not 
limited by what has thus far been described. Rather, the scope of the present invention is 
Hnftited only by the following claims. 
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